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Abstract 

Representation learning of knowledge 
bases aims to embed both entities and 
relations into a low-dimensional space. 
Most existing methods only consider 
direct relations in representation learning. 

We argue that multiple-step relation paths 
also contain rich inference patterns be¬ 
tween entities, and propose a path-based 
representation learning model. This model 
considers relation paths as translations 
between entities for representation learn¬ 
ing, and addresses two key challenges: (1) 
Since not all relation paths are reliable, 
we design a path-constraint resource allo¬ 
cation algorithm to measure the reliability 
of relation paths. (2) We represent relation 
paths via semantic composition of relation 
embeddings. Experimental results on 
real-world datasets show that, as com¬ 
pared with baselines, our model achieves 
significant and consistent improvements 
on knowledge base completion and re¬ 
lation extraction from text. The source 
code of this paper can be obtained from 
https://github.com/mrlyk423/ 
relation_extraction. 

1 Introduction 

People have recently built many large-scale 
knowledge bases (KBs) such as Freebase, DBpe- 
dia and YAGO. These KBs consist of facts about 
the real world, mostly in the form of triples, e.g., 
(Steve Jobs, FounderOf, Apple Inc.). KBs are 
important resources for many applications such as 
question answering and Web search. Although 
typical KBs are large in size, usually containing 
thousands of relation types, millions of entities 
and billions of facts (triples), they are far from 
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complete. Hence, many efforts have been invested 
in relation extraction to enrich KBs. 

Recent studies reveal that, neural-based repre¬ 
sentation learning methods are scalable and ef¬ 
fective to encode relational knowledge with low¬ 
dimensional representations of both entities and 
relations, which can be further used to extract 


unknown relational facts. TransE (Bordes et al., 


2013) is a typical method in the neural-based ap¬ 


proach, which learns vectors (i.e., embeddings) for 
both entities and relations. The basic idea behind 
TransE is that, the relationship between two enti¬ 
ties corresponds to a translation between the em¬ 
beddings of the entities, that is, h + r « t when 
the triple (h, r, t) holds. Since TransE has issues 
when modeling 1-to-N, N-to-1 and N-to-N rela¬ 


tions, various methods such as TransH (Wang et 


al., 2014 1 and TransR ( Lin et al., 2015| are pro¬ 


posed to assign an entity with different represen¬ 
tations when involved in various relations. 

Despite their success in modeling relational 
facts, TransE and its extensions only take di¬ 
rect relations between entities into considera¬ 
tion. It is known that there are also substan¬ 
tial multiple-step relation paths between entities 
indicating their semantic relationships. The re¬ 
lation paths reflect complicated inference pat¬ 
terns among relations in KBs. For example, the 

. . . , BornlnCity CitylnState 

relation path h - > e\ - > 

StatelnCountry . 

e 2 - > t indicates the relation 

Nationality between h and t, i.e., (h, 
Nationality, t). 

In this paper, we aim at extending TransE to 
model relation paths for representation learning of 
KBs, and propose path-based TransE (PTransE). 
In PTransE, in addition to direct connected rela¬ 
tional facts, we also build triples from KBs us¬ 
ing entity pairs connected with relation paths. 
As shown in Figure [TJ TransE only considers 
direct relations between entities, e.g., h -4 t, 
builds a triple (h, r , t), and optimizes the objec- 













tive h + r = t. PTransE generalizes TransE 
by regarding multiple-step relation paths as con¬ 
nections between entities. Take the 2-step path 
h - 4 - e\ —4 t for example as shown in Figure [I] 
Besides building triples (h,ri,ei) and (ei,r 2 ,t) 
for learning as in TransE, PTransE also builds a 
triple (h,r\ o r 2 ,t), and optimizes the objective 
h + (ri o T2) = t, where o is an operation to join 
the relations n and r 2 together into a unified rela¬ 
tion path representation. 


TransE PTransE 



As compared with TransE, PTransE takes rich 
relation paths in KBs for learning. There are two 
critical challenges that make PTransE nontrivial to 
learn from relation paths: 

Relation Path Reliability. Not all relation 
paths are meaningful and reliable for learning. For 
example, there is often a relation path h - riend > 
ei P - r ° fe5s:LO - n > t, but actually it does not indicate 
any semantic relationship between h and t. Hence, 
it is inappropriate to consider all relation paths in 
our model. In experiments, we find that those re¬ 
lation paths that lead to lots of possible tail enti¬ 
ties are mostly unreliable for the entity pair - . In 
this paper, we propose a path-constraint resource 
allocation algorithm to measure the reliability of 
relation paths. Afterwards, we select the reliable 
relation paths for representation learning. 

Relation Path Representation. In order to take 
relation paths into consideration, relation paths 
should also be represented in a low-dimensional 
space. It is straightforward that the semantic 
meaning of a relation path depends on all relations 
in this path. Given a relation path p = (rr,..., 77 ) 
, we will define and learn a binary operation func¬ 
tion (o) to obtain the path embedding p by re¬ 
cursively composing multiple relations, i.e., p = 
ri o ... o rj. 

With relation path selection and representation, 
PTransE learns entity and relation embeddings by 


regarding relation paths as translations between 
the corresponding entities. In experiments, we 
select a typical KB, Freebase, to build datasets 
and carry out evaluation on three tasks, including 
entity prediction, relation prediction and relation 
extraction from text. Experimental results show 
that, PTransE significantly outperforms TransE 
and other baseline methods on all three tasks. 

2 Our Model 

In this section, we introduce path-based TransE 
(PTransE) that learns representations of entities 
and relations considering relation paths. In TransE 
and PTransE, we have entity set E and relation 
set R, and learn to encode both entities and re¬ 
lations in M fc . Given a KB represented by a set of 
triples S = {(h, r, t)} with each triple composed 
of two entities h, t G E and their relation r £ R. 
Our model is expected to return a low energy score 
when the relation holds, and a high one otherwise. 

2.1 TransE and PTransE 

For each triple (h, r, t), TransE regards the relation 
as a translation vector r between two entity vectors 
h and t. The energy function is defined as 

E(h,r,t) = ||h + r —t||, (1) 

which is expected to get a low score when (h, r, t) 
holds, and high otherwise. 

TransE only learns from direct relations be¬ 
tween entities but ignores multiple-step relation 
paths, which also contain rich inference patterns 
between entities. PTransE take relation paths into 
consideration for representation learning. 

Suppose there are multiple relation paths 
P(h,t ) = {pi,... ,pn} connecting two entities 
h and t, where relation path p = (n,... , 77 ) indi¬ 
cates h —4 ... ^4 t. For each triple ( h,r,t ), the 
energy function is defined as 

G(h,r,t) = E{h,r,t) + E(h,P,t), (2) 

where E(h, r, t ) models correlations between rela¬ 
tions and entities with direct relation triples, as de¬ 
fined in Equation ([!]). E(h, P, t) models the infer¬ 
ence correlations between relations with multiple- 
step relation path triples, which is defined as 

E(h,P,t) = ^ ^ R(p\h,t)E(h,p,t), (3) 

peP(h,t) 

where R(p\h, t ) indicates the reliability of the re¬ 
lation path p given the entity pair (h,t), Z = 













J2 P eP(h t) R(p\hi t) i s a normalization factor, and 
E(h,p,t ) is the energy function of the triple 

( h,p,t ). 

For the energy of each triple ( h,p , t), the com¬ 
ponent R(p\h, t) concerns about relation path reli¬ 
ability, and E(h, p. t) concerns about relation path 
representation. We introduce the two components 
in detail as follows. 


2.2 Relation Path Reliability 

We propose a path-constraint resource allocation 
(PCRA) algorithm to measure the reliability of a 
relation path. Resource allocation over networks 
was originally proposed for personalized recom¬ 
mendation ( |Zhou et al., 2007) ), and has been suc¬ 
cessfully used in information retrieval for measur¬ 


ing relatedness between two objects (Lii and Zhou, 


]2011| ). Here we extend it to PCRA to measure the 
reliability of relation paths. The basic idea is, we 
assume that a certain amount of resource is associ¬ 
ated with the head entity h, and will flow following 
the given path p. We use the resource amount that 
eventually flows to the tail entity t to measure the 
reliability of the path p as a meaningful connection 
between h and t. 

Formally, for a path triple (h,p, t ), we compute 
the resource amount flowing from h to t given the 
path p = {v \,..., ri ) as follows. Starting from h 
and following the relation path p, we can write the 
flowing path as Sq —4- Sj —^ Si, where 
So = h and t e Sj. 

For any entity m 6 Sj, we denote its direct pre¬ 
decessors along relation r t in Sj_ i as Sj_ if-, rn). 
The resource flowing to m is defined as 

Rp{m) = Y lo/l xi Rp(n), (4) 


where Sj(n, •) is the direct successors of n G Sj ;-1 
following the relation 77 , and R p (n) is the resource 
obtained from the entity n. 

For each relation path p, we set the initial re¬ 
source in h as R p (h ) = 1. By performing re¬ 
source allocation recursively from h through the 
path p. the tail entity t eventually obtains the re¬ 
source Rp(t) which indicates how much informa¬ 
tion of the head entity h can be well translated. We 
use R p (t) to measure the reliability of the path p 
given (h, t), i.e., R(p\h, t ) = R p (t). 


2.3 Relation Path Representation 

Besides relation path reliability, we also need to 
define energy function E(h, p, t) for the path triple 


( h,p,t ) in Equation (|2]). Similar with the en¬ 
ergy function of TransE in Equation ([!]), we will 
also represent the relation path p in the embedding 
space. 


COOOl + COOP) = COOP) 

t 1 t 

Composition 
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Figure 2: Path representations are computed by se¬ 
mantic composition of relation embeddings. 

The semantic meaning of a relation path con¬ 
siderably relies on its involved relations. It is thus 
reasonable for us to build path embeddings via se¬ 
mantic composition of relation embeddings. As 
illustrated in Figure [2} the path embedding p is 
composed by the embeddings of BorninCity, 
CitylnState and StatelnCountry. 

Formally, for a path p = (n,..., 77 ), we define 
a composition operation o and obtain path embed¬ 
ding as p = ri o... o 17 . In this paper, we consider 
three types of composition operation: 

Addition (ADD). The addition operation ob¬ 
tains the vector of a path by summing up the vec¬ 
tors of all relations, which is formalized as 

p = ri + ... + iy. (5) 

Multiplication (MUL). The multiplication op¬ 
eration obtains the vector of a path as the cumula¬ 
tive product of the vectors of all relations, which 
is formalized as 


p = ri • ... • 17. 


( 6 ) 


Both addition and multiplication operations are 
simple and have been extensively investigated in 
semantic composition of phrases and sentences 
( (Mitchell and Lapata, 2008 ). 

Recurrent Neural Network (RNN). RNN is a 
recent neural-based model for semantic composi¬ 


tion (Mikolov et al., 2010). The composition op¬ 


eration is realized using a matrix W: 

Ci = /(W[cj_i; 77]), 


(V) 


where / is a non-linearity or identical function, 
and [a; b\ represents the concatenation of two vec- 
































tors. By setting c\ = r 1 and recursively perform¬ 
ing RNN following the relation path, we will fi¬ 
nally obtain p = c n . RNN has also been used 
for representation learning of relation paths in KBs 
(Neelakantan et al., 2015| . 

For a multiple-step relation path triple (h,p, t), 
we could have followed TransE and define the 
energy function as E(h,p,t ) = ||h + p — t||. 
Flowever, since we have minimized 11 h + r — 1 1 1 
with the direct relation triple (h, r, t ) to make sure 
r « t — h, we may directly define the energy func¬ 
tion of ( h,p,t ) as 


E(h,p,t) = ||p (t h) 11 = ||p —r|| = E(p, r), 

( 8 ) 

which is expected to be a low score when the 
multiple-relation path p is consistent with the di¬ 
rect relation r, and high otherwise, without using 
entity embeddings. 

2.4 Objective Formalization 

We formalize the optimization objective of 
PTransE as 

h(S) = Y [L(h,r,t )+i Y R ip\Kt)L{jp,r)\. 

(h,r,t)£S p£P(h,t) 

( 9 ) 

Following TransE, L(h,r,t ) and L(p,r) are 
margin-based loss functions with respect to the 
triple (h, r, t) and the pair (p. r ): 

L(h,r,t) = Y [7 + E(h,r,t)~ E(h',r',t')]+, 


L(P,r)= Y [7 + E(p,t) - E{p,t')\ + , ( 11 ) 

(h,r' ,t )£S — 


where [x]+ = max( 0 , x ) returns the maximum be¬ 
tween 0 and x, 7 is the margin, S is the set of valid 
triples existing in a KB and S - is the set of invalid 
triples. The objective will favor lower scores for 
valid triples as compared with invalid triples. 

The invalid triple set with respect to (h, r , t) is 
defined as 

S' = {(h',r,t)}U{(h,r',t)}U{(h,r,t')}. (12) 

That is, the set of invalid triples is composed of 
the original valid triple ( h,r,t ) with one of three 
components replaced. 


2.5 Optimization and Implementation Details 

For optimization, we employ stochastic gradient 
descent (SGD) to minimize the loss function. We 
randomly select a valid triple from the training set 
iteratively for learning. In the implementation, we 
also enforce constraints on the norms of the em¬ 
beddings h, r, t. That is, we set 


||h|| 2 < 1, ||r|| 2 < 1, ||11| 2 < 1- Vh,r,t. 

(13) 

There are also some implementation details that 
will significantly influence the performance of 
representation learning, which are introduced as 
follows. 

Reverse Relation Addition. In some cases, we 
are interested in the reverse version of a relation, 
which may not be presented in KBs. For exam- 

. ii- 1 BornlnCity 

pie, according to the relation path e\ - > 

CityOfCountry . , . r .1 r 

e 2 < - e 3 we expect to infer the fact 

that (ei, Nationality, e3). In this paper, how¬ 
ever, we only consider the relation paths follow¬ 
ing one direction. Flence, we add reverse relations 
for each relation in KBs. That is, for each triple 
(h, r, t ) we build another (t, r -1 , h). In this way, 
our method can consider the above-mentioned 

. BornlnCity CityOfCountry -1 

path as e\ - > e 2 - > 

for learning. 

Path Selection Limitation. There are usually 
large amount of relations and facts about each en¬ 
tity pair. It will be impractical to enumerate all 
possible relation paths between head and tail en¬ 
tities. For example, if each entity refers to more 
than 100 relations on average, which is common 
in Freebase, there will be billions of 4-step paths. 
Even for 2-step or 3-step paths, it will be time- 
consuming to consider all of them without limita¬ 
tion. For computational efficiency, in this paper 
we restrict the length of paths to at most 3-steps 
and consider those relation paths with the reliabil¬ 
ity score larger than 0.01. 


2.6 Complexity Analysis 

We denote N e as the number of entities, N r as 
the number of relations and K as the vector di¬ 
mension. The model parameter size of PTransE 
is ( N e K + N r K ), which is the same as TransE. 
PTransE follows the optimization procedure in¬ 
troduced by (Bordes et al., 2013) to solve Equa¬ 
tion ©• We denote S as the number of triples 
for learning, P as the expected number of relation 
paths between two entities, and L as the expected 











length of relation paths. For each iteration in opti¬ 
mization, the complexity of TransE is ()(SK) and 
the complexity of PTransE is O(SKPL) for ADD 
and MUL, and 0(SK 2 PL ) for RNN. 


3 Experiments and Analysis 

3.1 Data Sets and Experimental Setting 

We evaluate our method on a typical large-scale 
KB Freebase (Bollacker et al., 20081. In this pa¬ 
per, we adopt two datasets extracted from Free- 
base, i.e., FB15K and FB40K. The statistics of the 
datasets are listed in Table [Tj 

Table 1: Statistics of data sets. 


Dataset 

#Rel 

#Ent 

#Train 

#Valid 

# Test 

FB15K 

FB40K 

1,345 

1,336 

14,951 

39,528 

483.142 

370,648 

50,000 

67,946 

59,071 

96,678 


We evaluate the performance of PTransE and 
other baselines by predicting whether testing 
triples hold. We consider two scenarios: (1) 
Knowledge base completion, aiming to predict the 
missing entities or relations in given triples only 
based on existing KBs. (2) Relation extraction 
from texts, aiming to extract relations between en¬ 
tities based on information from both plain texts 
and KBs. 


3.2 Knowledge Base Completion 

The task of knowledge base completion is to com¬ 
plete the triple (h, r, t ) when one of h, t, r is miss¬ 


ing. The task has been used for evaluation in (Bor- 


|des et al., 201 It |Bordes et al., 2012[ |Bordes et] 


al., 2013). We conduct the evaluation on FB15K, 


which has 483,142 relational triples and 1, 345 re¬ 
lation types, among which there are rich inference 
and reasoning patterns. 

In the testing phase, for each testing triple 
(h, r, t ), we define the following score function for 
prediction 


S(h,r,t) = G(h,r,t) + G(t,r \/i), (14) 


and the score function G(h, r, t ) is further defined 
as 


G(h,r,t ) =| |h + r — t| |+ 

_L Pv(r\p)R(p\h,t)\\p — r 11. 

peP{h,t ) 

(15) 


The score function is similar to the energy func¬ 
tion defined in Section |2.1| The difference is that, 


here we consider the reliability of a path p is also 
related to the inference strength given r, which is 
quantified as Pr(r|p) = Pr(r, p)/Pr(p) obtained 
from the training data. 

We divide the stage into two sub-tasks, i.e., en¬ 
tity prediction and relation prediction. 


3.2.1 Entity Prediction 

In the sub-task of entity prediction, we follow the 
setting in (Bordes et al., 2013). For each test¬ 
ing triple with missing head or tail entity, vari¬ 
ous methods are asked to compute the scores of 
G(h , r, t) for all candidate entities and rank them 
in descending order. 

We use two measures as our evaluation metrics: 
the mean of correct entity ranks and the proportion 
of valid entities ranked in top-10 (Hits@10). As 
mentioned in ( (Bordes et al., 2013] ), the measures 
are desirable but flawed when an invalid triple 
ends up being valid in KBs. For example, when 
the testing triple is {Obama, PresidentOf, 
USA) with the head entity Obama is missing, the 
head entity Lincoln may be regarded invalid for 
prediction, but in fact it is valid in KBs. The eval¬ 
uation metrics will under-estimate those methods 
that rank these triples high. Hence, we can filter 
out all these valid triples in KBs before ranking. 
The first evaluation setting was named as “Raw” 
and the second one as “Filter”. 


For comparison, we select all methods in (Bor- 


des et al., 2013| [Wang et al., 2014 1 as our base¬ 


lines and use their reported results directly since 
the evaluation dataset is identical. 

Ideally, PTransE has to find all possible relation 
paths between the given entity and each candidate 
entity. However, it is time consuming and imprac¬ 
tical, because we need to iterate all candidate en¬ 
tities in \E\ for each testing triple. Here we adopt 
a re-ranking method: we first rank all candidate 
entities according to the scores from TransE, and 
then re-rank top-500 candidates according to the 
scores from PTransE. 

For PTransE, we find the best hyperparameters 
according to the mean rank in validation set. The 
optimal configurations of PTransE we used are 
A = 0.001, 7 = 1, k = 100 and taking L\ as 
dissimilarity. For training, we limit the number of 
epochs over all the training triples to 500. 

Evaluation results of entity prediction are 
shown in Table [2] The baselines include RESCAF 
( Nickel et al., 20lTj) , SE (Bordes et a h, 2011 [ ), 
SME (linear) ( [Bordes et al., 20121 , SME (bilinear) 





























Table 2: Evaluation results on entity prediction. 


Metric 

Mean 

Rank 

Hits@ 10 (%) 

Raw 

Filter 

Raw 

Filter 

RESCAL 

828 

683 

28.4 

44.1 

SE 

273 

162 

28.8 

39.8 

SME (linear) 

274 

154 

30.7 

40.8 

SME (bilinear) 

284 

158 

31.3 

41.3 

LFM 

283 

164 

26.0 

33.1 

TransE 

243 

125 

34.9 

47.1 

TransH 

212 

87 

45.7 

64.4 

TransR 

198 

77 

48.2 

68.7 

TransE (Our) 

205 

63 

47.9 

70.2 

PTransE (ADD, 2-step) 

200 

54 

51.8 

83.4 

PTransE (MUL, 2-step) 

216 

67 

47.4 

77.7 

PTransE (RNN, 2-step) 

242 

92 

50.6 

82.2 

PTransE (ADD, 3-step) 

207 

58 

51.4 

84.6 


( |Bordes et al., 2012] ), LFM ( [Jenatton et al., 20l2| ), 
TransE (Bordes et al., 2013) (original version and 
our implementation considering reverse relations), 


TransH (Wang et al., 2014), and TransR (Lin et al.. 


2015). 


For PTransE, we consider three composition op¬ 
erations for relation path representation: addition 
(ADD), multiplication (MUL) and recurrent neu¬ 
ral networks (RNN). We also consider relation 
paths with at most 2-steps and 3-steps. With the 
same configurations of PTransE, our TransE im¬ 
plementation achieves much better performance 
than that reported in ( [Bordes et al., 2013[ ). 

From Table [2] we observe that: (1) PTransE 
significantly and consistently outperforms other 
baselines including TransE. It indicates that rela¬ 
tion paths provide a good supplement for repre¬ 
sentation learning of KBs, which have been suc¬ 
cessfully encoded by PTransE. For example, since 
both George W. Bush and Abraham Lincoln were 
presidents of the United States , they exhibit simi¬ 
lar embeddings in TransE. This may lead to con¬ 
fusion for TransE to predict the spouse of Laura 
Bush. In contrast, since PTransE models rela¬ 
tion paths, it can take advantage of the relation 
paths between George W. Bush and Laura Bush, 
and leads to more accurate prediction. (2) For 
PTransE, the addition operation outperforms other 
composition operations in both Mean Rank and 
Hits @10. The reason is that, the addition opera¬ 
tion is compatible with the learning objectives of 
both TransE and PTransE. Take h - 4 - e\ t for 
example. The optimization objectives of two di¬ 
rect relations h + ri = ei and ei + r 2 = t can 
easily derive the path objective h + ri + T 2 = t. 
(3) PTransE of considering relation paths with at 
most 2-step and 3-step achieve comparable results. 


This indicates that it may be unnecessary to con¬ 
sider those relation paths that are too long. 

As defined in ( [Bordes et al., 2013) , relations in 
KBs can be divided into various types according 
to their mapping properties such as 1 -to-l, 1 -to- 
N, N-to-1 and N-to-N. Here we demonstrate the 
performance of PTransE and some baselines with 
respect to different types of relations in Table [3] 
It is observed that, on all mapping types of re¬ 
lations, PTransE consistently achieves significant 
improvement as compared with TransE. 

3.2.2 Relation Prediction 

Relation prediction aims to predict relations given 
two entities. We also use FB15K for evaluation. 
In this sub-task, we can use the score function of 
PTransE to rank candidate relations instead of re¬ 
ranking like in entity prediction. 

Since our implementation of TransE has 
achieved the best performance among all base¬ 
lines for entity prediction, here we only com¬ 
pare PTransE with TransE due to limited space. 
Evaluation results are shown in Table [4j where 
we report Hits@l instead of Hits @10 for com¬ 
parison, because Hits@10 for both TransE and 
PTransE exceeds 95%. In this table, we report 
the performance of TransE without reverse rela¬ 
tions (TransE), with reverse relations (+Rev) and 
considering relation paths for testing like that in 
PTransE (+Rev+Path). We report the performance 
of PTransE with only considering relation paths (- 
TransE), only considering the part in Equation ([[} 
(-Path) and considering both (PTransE). 

The optimal configurations of PTransE for re¬ 
lation prediction are identical to those for entity 
prediction: A = 0.001, 7 = 1, k = 100 and taking 
L) as dissimilarity. 

From Table[4]we observe that: (1) PTransE out¬ 
performs TransE+Rev+Path significantly for rela¬ 
tion prediction by reducing 41.8% prediction er¬ 
rors. (2) Even for TransE itself, considering re¬ 
lation paths for testing can reduce 17.3% errors 
as compared with TransE+Rev. It indicates that 
encoding relation paths will benefit for predict¬ 
ing relations. (3) PTransE with only considering 
relation paths (PTransE-TransE) gets surprisingly 
high mean rank. The reason is that, not all entity 
pairs in testing triples have relation paths, which 
will lead to random guess and the expectation of 
rank of these entity pairs is |i?|/2. Meanwhile, 
Hits@l of PTransE-TransE is relatively reason¬ 
able, which indicates the worth of modeling rela- 


















Table 3: Evaluation results on FB15K by mapping properties of relations. (%) 


Tasks 

Predicting Head Entities (Hits@ 10) 

Predicting Tail Entities (Hits@ 10) 

Relation Category 

1 -to-l 

1-to-N 

N-to-1 

N-to-N 

1 -to-l 

1-to-N 

N-to-1 

N-to-N 

SE 

35.6 

62.6 

17.2 

37.5 

34.9 

14.6 

68.3 

41.3 

SME (linear) 

35.1 

53.7 

19.0 

40.3 

32.7 

14.9 

61.6 

43.3 

SME (bilinear) 

30.9 

69.6 

19.9 

38.6 

28.2 

13.1 

76.0 

41.8 

TransE 

43.7 

65.7 

18.2 

47.2 

43.7 

19.7 

66.7 

50.0 

TransH 

66.8 

87.6 

28.7 

64.5 

65.5 

39.8 

83.3 

67.2 

TransR 

78.8 

89.2 

34.1 

69.2 

79.2 

37.4 

90.4 

72.1 

TransE (Our) 

74.6 

86.6 

43.7 

70.6 

71.5 

49.0 

85.0 

72.9 

PTransE (ADD, 2-step) 

91.0 

92.8 

60.9 

83.8 

91.2 

74.0 

88.9 

86.4 

PTransE (MUL, 2-step) 

89.0 

86.8 

57.6 

79.8 

87.8 

71.4 

72.2 

80.4 

PTransE (RNN, 2-step) 

88.9 

84.0 

56.3 

84.5 

88.8 

68.4 

81.5 

86.7 

PTrasnE (ADD, 3-step) 

90.1 

92.0 

58.7 

86.1 

90.7 

70.7 

87.5 

88.7 


tion paths. As compared with TransE, the inferior 
of PTransE-TransE also indicates that entity repre¬ 
sentations are informative and crucial for relation 
prediction. 


Table 4: Evaluation results on relation prediction. 


Metric 

Mean 

Rank 

Hits@ 1 (%) 

Raw 

Filter 

Raw 

Filter 

TransE (Our) 

2.8 

2.5 

65.1 

84.3 

+Rev 

2.6 

2.3 

67.1 

86.7 

+Rev+Path 

2.4 

1.9 

65.2 

89.0 

PTransE (ADD, 2-step) 

1.7 

1.2 

69.5 

93.6 

-TransE 

135.8 

135.3 

51.4 

78.0 

-Path 

2.0 

1.6 

69.7 

89.0 

PTransE (MUL. 2-step) 

2.5 

2.0 

66.3 

89.0 

PTransE (RNN, 2-step) 

1.9 

1.4 

68.3 

93.2 

PTransE (ADD, 3-step) 

1.8 

1.4 

68.5 

94.0 


3.3 Relation Extraction from Text 


Relation extraction from text aims to extract re¬ 
lational facts from plain text to enrich existing 
KBs. Many works regard large-scale KBs as dis¬ 
tant supervision to annotate sentences as training 
instances and build relation classifiers according to 
features extracted from the sentences (jMintz et aQ 
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Surdeanu et al., 2012). All these methods reason 


new facts only based on plain text. TransE was 
used to enrich a text-based model and achieved a 


significant improvement (Weston et al., 20131, and 


so do TransH (Wang et al., 2014) and TransR (Lin 


et al., 20151. In this task, we explore the effective¬ 


ness of PTransE for relation extraction from text. 

We use New York Times corpus (NYT) released 
by ( |Riedel et al., 2010 ) as training and testing data. 
NYT aligns Freebase with the articles in New York 
Times, and extracts sentence-level features such 


as part-of-speech tags, dependency tree paths for 
each mention. There are 53 relations (including 
non-relation denoted as NA) and 121, 034 training 
mentions. We use FB40K as the KB, consisting all 
entities mentioned in NYT and 1,336 relations. 


In the experiments, we implemented the text- 
based model Sm2r presented in (Weston et al., 


2013). We combine the ranking scores from 


the text-based model with those from KB rep¬ 
resentations to rank testing triples, and gener¬ 
ate precision-recall curves for both TransE and 
PTransE. For learning of TransE and PTransE, 
we set the dimensions of entities/relations embed¬ 
dings k = 50, the learning rate A = 0.001, the 
margin 7 = 1.0 and dissimilarity metric as LI. 


We also compare with MIMLRE (Surdeanu et al., 


2012 ) which is the state-of-art method using dis¬ 


tant supervision. The evaluation curves are shown 
in Figure [3] 



Figure 3: Precision-recall curves of Sm2r, TransE 
and PTransE combine with Sm2r. 
















































From Figure [3] we can observe that, by combin¬ 
ing with the text-based model Sm2r, the precision 
of PTransE significantly outperforms TransE espe¬ 
cially for the top-ranked triples. This indicates that 
encoding relation paths is also useful for relation 
extraction from text. 

Note that TransE used here does not consider 
reverse relations and relation paths because the 
performance does not change much. We analyze 
the reason as follows. In the task of knowledge 
base completion, each testing triple has at least 
one valid relation. In contrast, many testing triples 
in this task correspond to non-relation (NA), and 
there are usually several relation paths between 
two entities in these non-relation triples. TransE 
does not encode relation paths during the training 
phase like PTransE, which results in worse perfor¬ 
mance for predicting non-relation when consider¬ 
ing relation paths in the testing phase, and com¬ 
pensates the improvement on those triples that do 
have relations. This indicates it is non-trivial to 
encode relation paths, and also confirms the effec¬ 
tiveness of PTransE. 

3.4 Case Study of Relation Inference 

We have shown that PTransE can achieve high per¬ 
formance for knowledge base completion and re¬ 
lation extraction from text. In this section, we 
present some examples of relation inference ac¬ 
cording to relation paths. 

According to the learning results of PTransE, 
we can find new facts from KBs. As shown in 
Figure [4j two entities Forrest Gump and English 
are connected by three relation paths, which give 
us more confidence to predict the relation between 
the two entities to LanguageOfFilm. 



Figure 4: An inference example in Freebase. 


4 Related Work 


Recent years have witnessed great advances of 
modeling multi-relational data such as social net¬ 
works and KBs. Many works cope with rela¬ 
tional learning as a multi-relational representation 
learning problem, encoding both entities and re¬ 
lations in a low-dimensional latent space, based 


on Bayesian clustering (Kemp et al., 2006 Miller 


et al., 2009; Sutskever et al., 2009 

Zhu, 2012), 

energy-based models (Bordes et al., 2011 

Chen et 

al., 2013, Socher et al., 2013 Bordes et al., 2013, 

Bordes et al., 2014 

), matrix factorization (Singh 

and Gordon, 2008; Nickel et al., 2011, Nickel et 


al., 2012) . Among existing representation mod¬ 
els, TransE ( Bordes et al., 2013] ) regards a relation 
as translation between head and tail entities for 
optimization, which achieves a good trade-off be¬ 
tween prediction accuracy and computational effi¬ 
ciency. All existing representation learning meth¬ 
ods of knowledge bases only use direct relations 
between entities, ignoring rich information in re¬ 
lation paths. 

Relation paths have already been widely con¬ 
sidered in social networks and recommender sys¬ 
tems. Most of these works regard each relation and 
path as discrete symbols, and deal with them us¬ 
ing graph-based algorithms, such as random walks 
with restart (Tong et al., 2006). Relation paths 
have also been used for inference on large-scale 


KBs, such as Path Ranking algorithm (PRA) (Lao 


and Cohen, 2010), which has been adopted for ex¬ 
pert finding (Lao and Cohen, 2010) and informa¬ 
tion retrieval ( ]Lao et al., 2012j ). PRA has also been 
used for relation extraction based on KB structure 


(Lao et al., 2011, Gardner et al., 2013). (Nee- 


lakantan et al., 2015) further learns a recurrent 


neural network (RNN) to represent unseen rela¬ 
tion paths according to involved relations. We 
note that, these methods focus on modeling rela¬ 
tion paths for relation extraction without consid¬ 
ering any information of entities. In contrast, by 
successfully integrating the merits of modeling en¬ 
tities and relation paths, PTransE can learn supe¬ 
rior representations of both entities and relations 
for knowledge graph completion and relation ex¬ 
traction as shown in our experiments. 

5 Conclusion and Future Work 

This paper presents PTransE, a novel representa¬ 
tion learning method for KBs, which encodes re¬ 
lation paths to embed both entities and relations 






































































in a low-dimensional space. To take advantages 
of relation paths, we propose path-constraint re¬ 
source allocation to measure relation path reliabil¬ 
ity, and employ semantic composition of relations 
to represent paths for optimization. We evaluate 
PTransE on knowledge base completion and re¬ 
lation extraction from text. Experimental results 
show that PTransE achieves consistent and signif¬ 
icant improvements as compared with TransE and 
other baselines. 

In future, we will explore the following research 
directions: (1) This paper only considers the infer¬ 
ence patterns between direct relations and relation 
paths between two entities for learning. There are 
much complicated patterns among relations. For 
example, the inference form Queen(e) ===§> 
Female(e) cannot be handled by PTransE. We 
may take advantages of first-order logic to encode 
these inference patterns for representation learn¬ 
ing. (2) There are some extensions for TransE, 
e.g., TransH and TransR. It is non-trivial for them 
to adopt the idea of PTransE, and we will explore 
to extend PTransE to these models to better deal 
with complicated scenarios of KBs. 
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